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Methods for Presentation and 
Display of Multivariate Data 

I . Introduction 

This report deals with the development of methods for presenta- 
tion of or display of multivariate data. Multivariate analyses often 
involve a rather complicated data structure and one often is confronted 
with the prospect of merely quoting a test statistic and a corresponding 
significance level as his sole analysis product or output. Multivariate 
data in which certain factors are varied and several responses are being 
measured is difficult to interpret but the sheer volume certainly suggests 
that considerable data display is warranted, not mere data tabulation but 
certain displays that will aid in the interpretation of the analysis. In 
this report we attempt to suggest and Illustrate various data displays, 
and we emphasize multivariate analysis of variance problems. Of course, 
the usual Hotelling's solution in the two sample case becomes a special 
case and thus then will receive special emphasis with an illustrative 
example. 


II. The Two Sample Problem 


The two sample problem is designed to test 
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where and }^2 population mean vectors that represent means of p 
responses under two conditions. For example in NASA applications, these 
might be the means of several correlated responses to experiments under 
two different panel displays or two different conditions of G-seat (on or 
off). The total data array, of course, involves n^^ p-dimensional vectors 
under condition 1 and n 2 vectors under condition 2. Mean vectors x^ and 
X 2 ate computed where 

-1 “ ^^ll’’^12’’*’’^l,p^ 

-2 “ f^21**22*’**’’^2,p^ 


where represents the average of the observations under condition 1, 
response j; represents a similar quantity for condition 2. One assumes 
generally that each of the two multivariate populations has variance- 
covariance matrix E and an empirical estimate is obtained by pooling 
variance covariance estimates. We shall call this pooled estimate S. 

(n^^ 4* n 2 - 2 degrees of freedom) . The test statistic for testing Hq is 
given by 
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which is Hotelling’s T^' . An F-variate is then used to carry out the 
mechanics of the test. 


« 


2 









% 


(a) Need for Data Display 

Generally the user o£ the Hotelling's is interested in not only 
whether the above hypothesis is accepted or rejected but he also is keenly 
interested in the relative roles of the responses. Tha analyst should 
also gain some insight concerning the correlation of the responses and 
what effect this correlation is having on the test procedure. It would 
also seem reasonable that one should be able to have some light shed on 
the question of whether or not some of the responses can be totally 
Ignored, i.e., whether or not the total dimensionality of the problem 
can be substantially reduced. 


There are analytic procedures to aid in answering the question 
mentioned here. However, like the Hotelling’s or F-statistic. they 
involve the rather unpedagogic computation of certain types of numbers 
that are somewhat difficult to interpret; however, there is very little 
in the standard multivariate analysis that leads one to interpretation 
through plots, pic toral data displays, or informative tables. Here we 
shall suggest a few procedures and corresponding data displays that will 
hopefully shed light on interpretation and also complement the conclusion 
derived from the test statistic. Most of the procedures suggested here 
center around the procedures of discrimi nant analys _is (stepwise discrimi- 
nant analysis) and the computation of partial correlation coefficients. 
Thus we shall proceed to give a brief summary of these two concepts. 


(b) Discriminant Analysis 

The procedure of Discriminant Analysis is designed to generate the 
linear combination a'x which best separates or discriminates between the 
two conditions or treatments in the two sample problem. The multivariate 
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hypothesis is handled by considering the univariate hypothesis 


and a univariate t-test is conducted, and the test is based on the largest 
t^(a). Thus a is determined for which the t^ value 


t ( a ) 




( 2 . 1 ) 


is maximized. As we outlined in an earlier task, the structure of the 
t^ (^) statistic is actually as an F-distributed variate and thus the 

(n,+n -p-l) 

12 2 

(nj^-Hi2-2)p 

One of the important facets of this analysis is to be able to reduce 
the dimensionality of the problem by allowing for the elimination of 
responses that are either redundant or provide little in the way of 
separation of the two groups. Many statistical packages provide a 
stepwise or stage wise discriminant analysis that actually allows the 
user to follow pictorially the reduction in dimensionality. It also 
allows the user to attain a rather keen insight into not only which 
responses are relevant, but what is the minimum number of responses 
one can use to describe the separation between the groups, as well as 
an indication as to what the correlation structure is among the responses. 
We shall now proceed to discuss the stepwise algorithm and discuss the 
"data display" aspect later. 


statistic 


follows F 


p,n^+n2-p-l‘ 



Stepwise discriminant analysis proceeds much like stepwise regression, 
at least conceptually. One sequentially brings responses into the multi- 
variate model, each time looking at the contribution of each response in 
terms of its ability to provide separation between the groups or between 
the treatments. The criterion for including a response is very Intriguing. 

Suppose, for example, there are four responses and 2 groups (or 
treatments). The forward stepwise procedure begins by entering the 
response that provides the largest univariate t separating the two groups. 
Call this variable For step 2 the methodology will treat the model 

as if the new candidate variable is a univariate response and response 
variable 1 is a covariate in an analysis of covariance with treatment 
effects representing effects due to the two groups. The response Included 
is one that provides the largest F comparing treatments in this analysis 
of covariance model which would be written (for response k) 
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where the superscript denotes the response in question. The response 
k is chosen which separates the treatments the largest amount, adjusted 
for the Initial response 1. This procedure is continued but each time 
the responses entered in previous steps become covariates in succeeding 
steps. In addition, the procedure will continue until the response to 
be entered at a specific step is not significant at some specified 
level. In addition, the procedure will eliminate responses that have 
entered at previous steps if, in light of other responses, they cease 
to become significant. This would imply that the separation of the two 
responses provided by that variable is redundant and it is not needed as 
it was at an earlier stage. 
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The Backward Elimination Procedure is identical to the forward 
procedure except the method begins with all responses and eliminates one 
at a time. This IS sometimes preferable to forward stepwise procedures. 

III. Possible Data Display 

Most data analysts and members of the scientific community can 
identify with such displays as group means, correlation coefficients, 
significance levels of tests, etc. Our suggested data displays involve 
plots and tables that center around these concepts with view toward 
illustrating the answers to two questions. 

(i) Is there an appreciable change in the responses as you go 
from treatment 1 (say G seat off) to treatment 2 (G seat on)? 

(ii) What is the true dimensionality of the problem, i.e., how many 
responses are truly effected and what responses play the important roles? 

(a) Partial Correlation Coefficients 

These measures of linear association are quite different from the 
usual simple correlation coefficients [1 ] ordinarily observed in a multi- 
variate analysis, in that they measure degree of linear association betv;een 
two responses, conditional on the others . The interpretation would be 
that it expresses how much correlation exists between the two responses 
when all the others are held fixed. This is meant to give, at the outset, 
some indication to the user which linear associations among the responses 
might create problems in reducing the dimensionality of the problem. 
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(b) Plots of Sample Means 

In the two sample case it is particularly enlightening and actually 
requires very little data preparation, to plot the sample means using 
standardized observations. This is not intended as a direct vehicle for 
statistical inference but rather, along with the partial correlations, 
as an initial display. The example in the next section will illustrate 
this preliminary display. 

(c) Plots Resulting from Stepwise Discriminant Analysis 

The major data display would be a product of the stepwise discrimi- 
nant analysis and should illustrate the important responses, the reduc- 
tion in dimensionality of the problem, as well as the statistical 
significance associated with difference between the two treatments. 

Output from the forward stepwise discriminant analysis at each stage 
includes F-statistics indicating the significance of the Incoming 
response and a corresponding level of significance, and an F— statistic 
(or Hotelling's T^) Indicating the significance of the difference between 
the two groups at this stage (degree of separation with the responses 
present in the current stage) . Displays that would be of Interest in an 
illustrative way would be plots showing these significance levels plotted 
plctorially as a function of the stage of the stepwise procedure, the 
latter also being the number of responses currently in the multivariate 
model. With these plots the user (and hence the reader) can see step 
by step the role of each response as it enters the picture. Displayed 
will be measures of what responses are critical and at what point do 
additional responses provide no more separation in the two treatments. 
This will become clearer with our example in the next section. 
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IV. Example 


The example we use to Illustrate the data display features a subset 
of NASA data In which there are 27 data points in each of two groups 
(G-seat on and G-seat off) where six responses are being measured. The 
purpose of the experiment of course is to determine if there is a dif- 
ference "on the average" between the two groups or treatments, and then 
to determine if this difference is explained by one, two, three or 
perhaps more responses, and an indication of what these responses are. 

Of course, it would be advantageous for the analysis to display illustra- 
tions that point out these results. The original data is given in Table 
I. Table II gives the partial correlation coefficients. Of course, any 
strong partial correlation would give a clear indication of redundancy 
between two responses despite the activity of the other responses. Here, 
of course, the only responses showing a strong partial linear association 
are responses 2 and 3, while responses 4 and 5 show at least a moderate 
partial linear association. Incidentally, this partial correlation is 
taken within the group conditions (l.e., seat off-seat on). 

To Illustrate the analysis. Figures 1., 2., and 3. should be observed. 
Figure 1 is a simple plot of the means of the two groups for the six 
responses, using standardized data. Here, of course, it is clear from 
the display that variables 2 and 6 supply a goodly portion of the separa- 
tion between the two seat conditions. The next phase of the data display 
and analysis deals with illustration of the significance tests in the 
stepwise procedure. As each variable enters the model, essentially a 
hypothesis 
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variable entered does not provide any increase 
in separation between the groups 



H 


1 * 


variable provides an increase in separation 


is being tested through the mechanism described earlier. Figure 2 provides 
a plot of the significance level associated with each variable as it enters 
sequentially. Small values are evidence in favor of above. Clearly 
some subjectivity must be used by the analyst here concerning at what point 
he must decide that no further responses provide additional separation. 


Here, it is clear that 

(a) Response 2 provides a substantial separation between the two 


G-seat conditions. 

(b) Response 6 significantly increases the separation between the 
two conditions. 

(c) No additional responses provide significant separation beyond 
these two. 

Figure 3 is a bit more difficult to Interpret but still provides 
essential information. The basic analysis at each stage of the stepwise 
procedure is to conduct the Hotelling's (F— statistic) to determine if 

the two seat conditions differ on the average across the stepwise, l.e.. 


the test Is of the hypothesis 





as described earlier in this report. Plotted in the figure is the signifi- 
cance level of that test at each stage of the stepwise discriminant 
analysis. The indication is that at every step the separation between 
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the seat conditions is significant. However, at step 2, with the entry 
of variable 6 in the presence of 2, the separation is enhanced, whereas 
in succeeding stages there is an apparent "dilution" of this significance 
due to the addition of non-discriminating responses. Ideally, one might 
consider (rather loosely) that the minimum point in the plot would 

indicate the smallest subset of responses. 

The computer software for this work was the BMD package. It pro- 
vides all the tests described as well as the partial correlation co- 
efficients. 
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